Zero-Shot Visual Question Answering Using Knowledge Graph
نویسندگان
چکیده
Incorporating external knowledge to Visual Question Answering (VQA) has become a vital practical need. Existing methods mostly adopt pipeline approaches with different components for matching and extraction, feature learning, etc. However, such suffer when some component does not perform well, which leads error cascading poor overall performance. Furthermore, the majority of existing ignore answer bias issue—many answers may have never appeared during training (i.e., unseen answers) in real-word application. To bridge these gaps, this paper, we propose Zero-shot VQA algorithm using graph mask-based learning mechanism better incorporating knowledge, present new answer-based splits F-VQA dataset. Experiments show that our method can achieve state-of-the-art performance answers, meanwhile dramatically augment end-to-end models on normal task.
منابع مشابه
Zero-Shot Visual Question Answering
Part of the appeal of Visual Question Answering (VQA) is its promise to answer new questions about previously unseen images. Most current methods demand training questions that illustrate every possible concept, and will therefore never achieve this capability, since the volume of required training data would be prohibitive. Answering general questions about images requires methods capable of Z...
متن کاملConstraint-Based Question Answering with Knowledge Graph
WebQuestions and SimpleQuestions are two benchmark data-sets commonly used in recent knowledge-based question answering (KBQA) work. Most questions in them are ‘simple’ questions which can be answered based on a single relation in the knowledge base. Such data-sets lack the capability of evaluating KBQA systems on complicated questions. Motivated by this issue, we release a new data-set, namely...
متن کاملVisual Question Answering Using Various Methods
This project tries to apply deep learning tools to enable computer answering question by looking at images. In this project, the visual question answering dataset[1] is introduced. This dataset consists of 204,721 real images, 614,164 question and 50,000 abstract scenes, 150,000 questions. Various methods are reproduced. The analysis on different models are presented.
متن کاملVisual Question Answering using Deep Learning
Multimodal learning between images and language has gained attention of researchers over the past few years. Using recent deep learning techniques, specifically end-to-end trainable artificial neural networks, performance in tasks like automatic image captioning, bidirectional sentence and image retrieval have been significantly improved. Recently, as a further exploration of present artificial...
متن کاملZero-shot Visual Imitation
Existing approaches to imitation learning distill both what to do—goals—and how to do it—skills—from expert demonstrations. This expertise is effective but expensive supervision: it is not always practical to collect many detailed demonstrations. We argue that if an agent has access to its environment along with the expert, it can learn skills from its own experience and rely on expertise for t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2021
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-030-88361-4_9